3 research outputs found
Generative Adversarial Network and Its Application in Aerial Vehicle Detection and Biometric Identification System
In recent years, generative adversarial networks (GANs) have shown great potential in advancing the state-of-the-art in many areas of computer vision, most notably in image synthesis and manipulation tasks. GAN is a generative model which simultaneously trains a generator and a discriminator in an adversarial manner to produce real-looking synthetic data by capturing the underlying data distribution. Due to its powerful ability to generate high-quality and visually pleasingresults, we apply it to super-resolution and image-to-image translation techniques to address vehicle detection in low-resolution aerial images and cross-spectral cross-resolution iris recognition. First, we develop a Multi-scale GAN (MsGAN) with multiple intermediate outputs, which progressively learns the details and features of the high-resolution aerial images at different scales. Then the upscaled super-resolved aerial images are fed to a You Only Look Once-version 3 (YOLO-v3) object detector and the detection loss is jointly optimized along with a super-resolution loss to emphasize target vehicles sensitive to the super-resolution process. There is another problem that remains unsolved when detection takes place at night or in a dark environment, which requires an IR detector. Training such a detector needs a lot of infrared (IR) images. To address these challenges, we develop a GAN-based joint cross-modal super-resolution framework where low-resolution (LR) IR images are translated and super-resolved to high-resolution (HR) visible (VIS) images before applying detection. This approach significantly improves the accuracy of aerial vehicle detection by leveraging the benefits of super-resolution techniques in a cross-modal domain. Second, to increase the performance and reliability of deep learning-based biometric identification systems, we focus on developing conditional GAN (cGAN) based cross-spectral cross-resolution iris recognition and offer two different frameworks. The first approach trains a cGAN to jointly translate and super-resolve LR near-infrared (NIR) iris images to HR VIS iris images to perform cross-spectral cross-resolution iris matching to the same resolution and within the same spectrum. In the second approach, we design a coupled GAN (cpGAN) architecture to project both VIS and NIR iris images into a low-dimensional embedding domain. The goal of this architecture is to ensure maximum pairwise similarity between the feature vectors from the two iris modalities of the same subject. We have also proposed a pose attention-guided coupled profile-to-frontal face recognition network to learn discriminative and pose-invariant features in an embedding subspace. To show that the feature vectors learned by this deep subspace can be used for other tasks beyond recognition, we implement a GAN architecture which is able to reconstruct a frontal face from its corresponding profile face. This capability can be used in various face analysis tasks, such as emotion detection and expression tracking, where having a frontal face image can improve accuracy and reliability. Overall, our research works have shown its efficacy by achieving new state-of-the-art results through extensive experiments on publicly available datasets reported in the literature
Joint-SRVDNet: Joint Super Resolution and Vehicle Detection Network
In many domestic and military applications, aerial vehicle detection and
super-resolutionalgorithms are frequently developed and applied independently.
However, aerial vehicle detection on super-resolved images remains a
challenging task due to the lack of discriminative information in the
super-resolved images. To address this problem, we propose a Joint
Super-Resolution and Vehicle DetectionNetwork (Joint-SRVDNet) that tries to
generate discriminative, high-resolution images of vehicles fromlow-resolution
aerial images. First, aerial images are up-scaled by a factor of 4x using a
Multi-scaleGenerative Adversarial Network (MsGAN), which has multiple
intermediate outputs with increasingresolutions. Second, a detector is trained
on super-resolved images that are upscaled by factor 4x usingMsGAN architecture
and finally, the detection loss is minimized jointly with the super-resolution
loss toencourage the target detector to be sensitive to the subsequent
super-resolution training. The network jointlylearns hierarchical and
discriminative features of targets and produces optimal super-resolution
results. Weperform both quantitative and qualitative evaluation of our proposed
network on VEDAI, xView and DOTAdatasets. The experimental results show that
our proposed framework achieves better visual quality than thestate-of-the-art
methods for aerial super-resolution with 4x up-scaling factor and improves the
accuracy ofaerial vehicle detection
Information Maximization for Extreme Pose Face Recognition
In this paper, we seek to draw connections between the frontal and profile
face images in an abstract embedding space. We exploit this connection using a
coupled-encoder network to project frontal/profile face images into a common
latent embedding space. The proposed model forces the similarity of
representations in the embedding space by maximizing the mutual information
between two views of the face. The proposed coupled-encoder benefits from three
contributions for matching faces with extreme pose disparities. First, we
leverage our pose-aware contrastive learning to maximize the mutual information
between frontal and profile representations of identities. Second, a memory
buffer, which consists of latent representations accumulated over past
iterations, is integrated into the model so it can refer to relatively much
more instances than the mini-batch size. Third, a novel pose-aware adversarial
domain adaptation method forces the model to learn an asymmetric mapping from
profile to frontal representation. In our framework, the coupled-encoder learns
to enlarge the margin between the distribution of genuine and imposter faces,
which results in high mutual information between different views of the same
identity. The effectiveness of the proposed model is investigated through
extensive experiments, evaluations, and ablation studies on four benchmark
datasets, and comparison with the compelling state-of-the-art algorithms.Comment: INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS (IJCB 2022